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Application No. ^ Applicant(s) 

09/297,038 SEYA 

Examiner Art Unit 

David D. Knepper 2654 
-- The MAILING DATE of this communication appears on the cover sheet with the correspondence address -- 
Period for Reply 

A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) FROM 
THE MAILING DATE OF THIS COMMUNICATION. 

- Extensions of time may be available under the provisions of 37 CFR 1 .136(a). In no event, however, may a reply be timely filed 
after SIX (6) MONTHS from the mailing date of this communication. 

- If the period for reply specified above is less than thirty (30) days, a reply within the statutory minimum of thirty (30) days will be considered timely. 

- If NO period for reply is specified above, the maximum statutory period will apply and will expire SIX (6) MONTHS from the mailing date of this communication. 

- Failure to reply within the set or extended period for reply will, by statute, cause the application to become ABANDONED (35 U.S.C. § 133). 

- Any reply received by the Office later than three months after the mailing date of this communication, even if timely filed, may reduce any 
earned patent term adjustment. See 37 CFR 1 .704(b). 

Status 

1)K Responsive to communication(s) filed on 21 July 2003 . 
2a)D This action is FINAL. 2b)S This action is non-final. 

3) Q Since this application is in condition for allowance except for formal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex parte Quay/e, 1935 CD. 1 1 , 453 O.G. 213. 
Disposition of Claims 

4) ^ Claim(s) 1-12 and 24-30 is/are pending in the application. 

4a) Of the above claim(s) is/are withdrawn from consideration. 

5) D Claim(s) is/are allowed. 

6) I3 Claim(s) 1-12 and 24-30 is/are rejected. 

7) D Claim(s) is/are objected to. 

8) D Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) D The specification is objected to by the Examiner. 

10) D The drawing(s) filed on is/are: a)D accepted or b)D objected to by the Examiner. 

Applicant may not request that any objection to the drawing(s) be held in abeyance. See 37 CFR 1 .85(a). 

11) D The proposed drawing correction filed on is: a)D approved b)D disapproved by the Examiner. 

If approved, corrected drawings are required in reply to this Office action. 

12) D The oath or declaration is objected to by the Examiner. 
Priority under 35 U.S.C. §§119 and 120 

13) Q Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 1 19(a)-(d) or (f). 

a)D All b)D Some*c)D None of: 

1 0 Certified copies of the priority documents have been received. 

2.Q Certified copies of the priority documents have been received in Application No. . 

3-D Copies of the certified copies of the priority documents have been received in this National Stage 
application from the International Bureau (PCT Rule 17.2(a)). 
* See the attached detailed Office action for a list of the certified copies not received. 

14) D Acknowledgment is made of a claim for domestic priority under 35 U.S.C. § 119(e) (to a provisional application). 

a) D The translation of the foreign language provisional application has been received. 

15) D Acknowledgment is made of a claim for domestic priority under 35 U.S.C. §§ 120 and/or 121. 
Attach ment(s) 

1 ) O Notice of References Cited (PTO-892) 4) □ Interview Summary (PTO-413) Paper No(s). . 

2) CH Notice of Draftsperson's Patent Drawing Review (PTO-948) 5) d Notice of Informal Patent Application (PTO-152) 

3) O Information Disclosure Statement(s) (PTO-1449) Paper No(s) . 6) CD Other: 
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1. Applicant's correspondence filed on 21 April 2003 and 21 July 2003 (papers #9, 10, 12 
and 13) has been received and considered. Claims 1-12 and 24-30 are pending. Claims 13-23 
and 31-52 have been canceled. 



2. The Abstract of the Disclosure is objected to because it covers material that is not 
supported by the specification. The abstract indicates that that the invention will separate 
musical, vocal input into language (presumably, the lyrics) and accompaniment information 
(presumably, the musical score). The invention then purports to translate the vocal (lyrics) and 
produce a second vocal output, which is a translated version of the input. Correction is required. 
See M.P.E.P. § 608.01(b). 



3. The drawings are objected to under 37 CFR 1.83(a). The drawings must show every 
feature of the invention specified in the claims. Therefore, the information that is being 
processed must be shown or the feature(s) canceled from the claim(s). For example, the "vocal 
information", "accompaniment 55 , "language lyric 55 and "musical information." The drawings fail 
to show how any type of data separation is performed. The drawings must show the data input 
relied upon as well as the method for processing the data to achieve the desired result 
commensurate with the description and claims. 
No new matter should be entered. 

The proposed changes to figures 1, 2, 4, 5 and 6 are approved. However, the changes to 
figures 4-6 only show a desired result without providing any useful showing of how the 
information is determined or extracted. The information itself is NOT illustrated. Only text is 



Abstract 



Drawings 
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provided which is only sufficient if the information and methods (or means plus function) for 
analyzing, canceling and extracting the information is obvious or well known per se. 

New Matter 

4. The amendment filed 21 April 2003 is objected to under 35 U.S.C. 132 because it 
introduces new matter into the disclosure. 35 U.S.C. 132 states that no amendment shall 
introduce new matter into the disclosure of the invention. The added material which is not 
supported by the original disclosure is as follows. 

The relationship established in the Amendment filed 21 April 2003 (paper #9) contains 
new matter. This new material indicates that some sort of ID is based on a subscription list and 
is dependent on payment of a use fee (added to page 14 of the specification). No such definition 
of collation data and permission to use a terminal device was previously given. 

Applicant is required to cancel the new matter in the reply to this Office Action. 

Claims 

5. Claims 1-12 and 24-30 are rejected under 35 U.S.C. § 112, second paragraph, as being 
indefinite for failing to particularly point out and distinctly claim the subject matter which 
applicant regards as the invention. 

Claims 1 and 24 are rejected as noted below. 

The new claim language limit the input to "first vocal-containing musical number 
information" the claims are confusing because they contradict the specification. 

"Generating the first language lyric information by speech recognition of the first 
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vocal information" indicates that someone speaks the lyrics into the device. From the antecedent 
reference of the separation unit, the lyrics must be derived from speech. However, the 
accompaniment is not limited to any particular "musical information", be it vocal, instrumental 
or otherwise. This contradicts the specification such as page 23 which indicates that some 
unique form of karaoke information is required as a substitute for standard stereo audio such as 
left/right (L/R) and center information that is normally used for some types of stereo coding. 
Confusion exists because the claims and the specification fail to provide reasonable antecedent 
basis to form a meaningful relationship for interpretation of the claims. The claims do not 
indicate the need for any particular karaoke information and must therefore apply to other forms 
of input as well. 

The claims are broad enough to include the separation of any speech from any input 
audio. For example, this would include choral music and the separation of a typical 4-part 
(SATB) harmony into the desired lyrics of one or more parts (the lyrics could vary among parts). 
Other possibilities would include one or more singers accompanied by a band or orchestra in 
which separation of parts could be much more complicated between singers, instruments and/or 
other accompaniment parts. Howey^^ unit 212" is described on page 23 as 

including "vocal canceling unit 212a" which contains a digital filter capable of erasing the vocal 
part. It is unclear what relationship between "vocal information", "accompaniment information" 
and "musical information" is established and how the relationships should be defined and/or 
separated. 

It is unclear whether the claimed "language lyric information" is part of the "vocal 
information D3 or the karaoke information D2 (page 23, line 18). The combination claimed fails 
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to provide an antecedent basis for a clear relationship to the specification. 

It is unclear whether the applicant intends to limit the invention to musical related 
information or whether the combination of data can include a wider variety of multimedia 
information. Therefore, the separation of data is interpreted to be broad enough to include 
various types of information known in the art. 

The only specific application mentioned in the specification is for karaoke related data. 
However, neither the specification nor the claims clearly describe any particular requirements for 
data structure or data format for karaoke devices and/or methods for using karaoke devices. 

6. The following is a quotation of the first paragraph of 35 U.S.C. 1 12: 

The specification shall contain a written description of the invention, and of the manner and process of making 
and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it 
pertains, or with which it is most nearly connected, to make and use the same and shall set forth the best mode 
contemplated by the inventor of carrying out his invention. 

7. Claims 1-12 and 24-30 are rejected under 35 U.S.C. 112, first paragraph, as containing 
subject matter which was not described in the specification in such a way as to enable one skilled 
in the art to which it pertains, or with which it is most nearly connected, to make and/or use the 
invention. 

The claims (i.e. - 1 and 24) are drawn towards "an information processing apparatus 55 . 
There is no particular limitation to the application of the apparatus itself. The only limitations 
are functional. The primary limitations are the ability to accept combinations of vocal, 
accompaniment and musical information and to separate them. Once they are separated, the 
claims indicate that the vocal information portion will be further analyzed by speech recognition 
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which will be used to generate lyrics and to translate lyrics from a first language to a second 
language. Then the second language translation is synthesized with the accompaniment 

The specification on page 6 indicates that "the required information" is not particularly 
limited but may include various data "such as audio information, text information, image 
information or the picture information as later explained..." The specification fails to limit the 
information to any particular format or combination of data. This makes it unlikely that one of 
ordinary skill in the art could hope to predict how to process the data since there is no way to 
know what needs to be processed. The desired result of the processed data is similarly vague. 

The transmission method is not limited to any particular format. On page 8, the applicant 
states: "There is no particular limitation to the communication network 4, [figures 1 and 3] such 
that it is possible to utilize CATV (cable television, community antenna television), 
communication satellite, public telephone network or wireless communication. . ." Therefore, the 
method of transmission is all-inclusive and does nothing to define the data or its components. 

Page 9 of the specification describes "the intermediate transmission devices 2" in a 
similarly generic fashion. Figure 3 shows elements of device 2 but the description is similarly 
vague. Page 9, last paragraph indicates that devices 2 may be anywhere and are made up of "a 
display unit 203 for optionally displaying the required contents associated with the operations 
and a key actuating unit 202." Page 10 merely indicates that device 2 "is also provided with a 
terminal device attachment portion 204 for attaching the portable terminal device 3 . . . while the 
power supply terminal 206 is electrically connected to a power input terminal 307 of the portable 
terminal device 3." Page 11 merely indicates that these generic connections allow transmission 
of data and necessary power to both devices 2 and 3. 
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On page 12, last paragraph, the server device is described in part as containing "an 
assessment processing unit 105 for assessment processing for the user and an interfacing unit 106 
for having communication with the intermediate transmission device 2." The function of the 
"assessment processing unit 105" is undefined. Neither description of the data being assessed 
nor any description of the resultant assessment is provided to give life and meaning to these 
terms. 

On page 13, second paragraph, the applicant states: "a unique protocol or TCP/IP 
(Transmission Control Protocol/Internet Protocol) transmitting data generally used on the 
Internet by packets, may be used." This indicates that the applicant may employ an undefined 
"unique protocol" or a standard protocol such as TCP/IP. Because the data necessary to the 
invention is not clearly defined, the reader is unable to determine whether a "unique protocol" 
must be proprietary to achieve desired results or whether a standard protocol could really be used 
to achieve the same desired results. Even more problematic is that the desired results are 
unknown making further analysis virtually impossible. 

Applicant'^reliance upon foreign [Japanese] documents H-3-139923 or 3-13922 for 
teaching how to make and use TwinVQ (page 14) is acceptable since the statements indicate that 
the form of data compression is not considered inventive. However, since the data and the form 
of compression have no bounds, the burden on the applicant to provide a specific form of signal 
processing that can apply to any audio input, regardless of the type or format is that much more 
difficult to meet. 

The original specification, page 14, second full paragraph failed to indicate what data is 
"collated". Supposedly, "The terminal ID data of the portable terminal device 3" is magically 
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collated with "the terminal ID data of the portable terminal device that is currently able to use the 
information distribution system". How is a single device "currently able to use the 
information..." identified? Since page 13 implies use of the Internet, why would only one 
device (such as a computer) be able to use such information? The Examiner cannot resolve how 
"collation processing" is used (i.e. -"the results of collation") to decide what is "permitted". 

The relationship established in the Amendment filed 21 April 2003 (paper #9) contains 
new matter. This new material indicates that the some sort of ID based on a subscription list is 
required and its functionality is somehow dependent on payment of a use fee. No such definition 
of collation data and permission to use a terminal device was previously given. The new 
relationship indicating a use fee is not considered consistent with page 15, lines 9-18 which 
refers to "a charging circuit, for supplying the power to the various parts." This provided an 
antecedent for the term "electrical charging" defined as something that supplies electrical power 
rather than charging fees as the new language would imply. The following paragraph was the 
original complaint under 35 US 112, first paragraph of the original language. 

The last paragraph of page 14 (continuing to page 15) does not explain what sort of 
"assessment" is desired. The sentence "... the assessment processing unit 105 performs the 
processing of assessment of the amount in meeting with the state of use of the information 
distribution system by the user. . ." is incoherent. Neither the data being assessed, the process by 
which assessment is performed, what "amount" (amount of what?) is utilized nor the "state of 
use" are defined. The applicant throws about these terms without being given any meaningful 
definitions thereof. The example given is nonsense: "the request information for information 
copying or electrical charging..." How can a request for "information copying" be treated as an 
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alternative to "electrical charging" (charging a battery?). What does each of these things really 
mean? 

No evidence that the applicant's invention can take a song and separate portions of the 
vocal information and accompaniment (vocal and/or instrumental) is presented by the applicant. 

Page 18, second paragraph mentions "speech recognition translation unit 321" and 
"speech synthesis unit 322" but provides no details capable of actually achieving the desired 
results of either unit. 

Page 19, last paragraph, indicates that "speech recognition translation unit 321 is fed with 
the vocal information transmitted along with the karaoke information after separation by the 
vocal separation unit 212 of the intermediate transmission device 2, and performs speech 
recognition of the vocal information." Again, no details for separating desired audio (a particular 
vocal) from other audio data is even offered by the applicant. Similarly, no details for 
performing speech recognition and translation to another language are offered. As a minimum, 
the drawings should show the steps of analysis necessary to input typical song data and extract 
specific parameters that can be analyzed by a computer to determine the desired results. Details 
must be provided giving a reasonably detailed explanation of how one of ordinary skill in the art 
could expect successful separation, recognition and translation. 

Paper #9 presents arguments on page 7 that page 23 of the specification provides support 
for a separation unit. However, page 23 of the specification is also characterized as a non- 
limiting example. Reading the specification itself, it is found that page 23 explicitly states that 
"the detailed structure of the vocal canceling unit 212a is omitted." It goes on to say that "...the 
vocal canceling unit 212a generates the karaoke information D2 using the well-known technique 
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of canceling the speech signals fixed at the center on stereo reproduction with the {(L channel 
data)-(R channel data)}." However, the applicant has never limited the vocal information to a 
particular type of stereo data. This technique would not necessarily reside in a digital filter nor 
would a generic digital filter have the ability to isolate specific vocal information from other 
types of vocal and/or other audio information. Therefore, the applicant is required to show a 
technique for separating speech from other audio data that is commensurate in scope with the 
claims and the specification. At the very least, a specific digital filter capable of such 
functionality must be shown. The applicant is reminded of the specification on page 6 (noted 
above) which indicates that "the required information" is not particularly limited but may include 
various data "such as audio information, text information, image information or the picture 
information as later explained. . . " 

Page 20, first full paragraph, indicates that "speech synthesis unit 322 first generates the 
novel vocal information (audio data) sung with the lyric of the as-translated second language, 
based on the second language lyric information generated by the speech recognition translation 
unit 321." No details for performing such a desired manner of synthesis are provided. The 
apparatus and method for analysis as well as the parameters for modeling "original vocal 
information" must be provided. Similar details for synthesizing speech with musical properties 
must be shown that will allow utilization of "original vocal information." Further details are 
necessary to show "original vocal information" specifics with regard to music and speech. Such 
details must include time and frequency as it relates to musical pitch and vocal tract and/or other 
information that is specific to language idiosyncrasies. For example, in English, changes in pitch 
do not change the literally meaning of a word, but in certain Eastern languages (i.e. - Chinese) a 
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rising or falling pitch could change the meaning of an otherwise identical pronunciation. Such 
details provide interesting challenges to the desired results of applicant's invention. However, 
no details are provided to address these or even more basic information. 



8. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 

obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or 
described as set forth in section 102 of this title, if the differences between the subject 
matter sought to be patented and the prior art are such that the subject matter as a whole 
would have been obvious at the time the invention was made to a person having ordinary 
skill in the art to which said subject matter pertains. Patentability shall not be negatived 
by the manner in which the invention was made. 



9. Claims 1-12 and 24-30 are rejected under 35 U.S.C. § 103 as being unpatentable over 
Stelovsky (5,613,909) in view of Bordeaux (4,852,170) and Lyberg (5,546,500). 

As per claims 1 and 24, "information processing" is taught by both references: 

"separating a first vocal information part in a first language and an 
accompaniment information part . . . generating first language lyric information" (Stelovsky 
teaches the separation of lyrics in figure 5 - see also column 9, lines 12-21 where he indicates 
the lyrics can be viewed separately from the music and accompanying video by using separate 
tracks that are common in karaoke applications); 

"translating the generated first language letter information into the second 
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language letter information" (suggested in column 14, lines 21-22 using direct translation into 
another language ); and 



column 14, lines 18-19 where he teaches that the audio track can be generated rather than 
recorded (e.g. using a speech generator.) - see also col. 14, lines 22-24 that teaches Any of the 
tracks of presentation can be generated remotely and transmitted using any existing 
communication means. Thus, it would have been obvious to combine or otherwise synthesize 
any known combinations of the data). 

It is noted that Stelovsky does not explicitly teach the use of "speech recognition" to 
perform translation. However, he teaches that translation is obvious in combination with a 
karaoke or other multimedia separation of data elements in order to facilitate education and/or 
entertainment. Bordeaux and Lyberg teach details for performing speech recognition and in 
column 12, lines 60-65, Bordeaux teaches that for use in foreign languages ... a different natural 
language or orthographic translator would be employed . It would have been obvious for a 
person having ordinary skill in the pertinent art, at the time the invention was made, to combine a 
speech recognition based translator such as Bordeaux with the device of Stelovsky because 
Stelovsky specifically invites the use of future facilities (col. 14, line 11) which include 
translation into other languages as noted above. Lyberg explicitly recites Bordeaux in column 1, 
line 24 and is utilized because he clearly teaches that it is known to combine synthesis with a 
translation device (see abstract) in such a way as to preserve prosodic information even after 
translation to further improve translation. Thus, Lyberg teaches that a combination of speech 



synthesizing the second language lyric information" (suggested in 
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recognition for translation to a second language can be used to preserve stresses in the first 
language (abstract). 

Claims 2-12 and 25-30 are rejected under similar arguments as presented above. 
Although the claims are unclear, it is presumed that the applicant is attempting to limit some of 
the synthesis related elements to preserving information gathered during analysis or recognition. 
This is taught by Lyberg who preserves prosody following recognition and translation for use in 
synthesis. 



to be translated into any natural language. Lyberg teaches that such translation devices can 
separate the language and prosody in order to allow preservation of the prosody after translation. 



10. The applicant's remarks make generic statements that changes were made in response to 
the rejections. However, it is not seen how any of the changes actually overcome the 
deficiencies other than the typing error calling the separation unit a storage unit. The rejections 
have been modified according to the new phraseology in the claims. 

The argument on page 16 against the prior art does not address the fact that Stelovsky 
shows that his device can separate lyrics from musical information for use in a karaoke device. 
This is significant because the applicant indicates in the specification that the invention is related 
to a karaoke device but the claims are framed in a much broader sense to cover other uses of 
audio. 

Stelovsky suggests translation into another language in column 14, lines 21-22 and both 
Lyberg and Bordeaux show that translation can be done using speech recognition. Stelovsky 



Bordeaux teaches details regarding the identification of phoneme strings (words) 



Remarks 
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also teaches that his invention is designed to be flexible to incorporate future improvements. 
Specifically, he is able to allow programming to utilize recorded or generated audio to include a 
speech generator (col. 14, lines 11-20). Thus, the combination of references teaches that it is 
obvious to perform translation using text and/or speech recognition to improve speech 
comprehension and improve the usefulness (Industrial Applicability) of Stelovsky's system. 

1 1 . Any response to this action should be mailed to: 

Commissioner of Patents and Trademarks 
Washington, D.C. 20231 

or faxed to: 

TC2600 Fax Center 
(703) 872-9314 

Hand-delivered responses should be brought to Crystal Park II, 2121 Crystal Drive, Arlington. 
VA., Sixth Floor (Receptionist). 

12. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to David D. Knepper whose telephone number is (703) 305-9644. 
The examiner can normally be reached on Monday-Thursday from 07:30 a.m.-6:00 p.m. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil, can be reached on (703) 305-9645. 

Any inquiry of a general nature or relating to the status of this application should be 
directed to customer service at (703) 306-0377. 

The facsimile number for TC 2600 is (703) 872-93 14. 




David D. Knepper 
Primary Examiner 
Art Unit 2654 



